Identification of malicious nodes in an AODV pure ad hoc network through guard nodes
نویسندگان
چکیده
This paper presents a guard node based scheme to identify malicious nodes in Ad hoc on-Demand Vector (AODV) protocol. Each node calculates trust level of its neighboring nodes for route selection. Trust calculation process involves opinions of other nodes about the node whose trust level is to be determined. If a neighboring node has a trust level lower than predefined threshold value. it’s identified as malicious and it’s not considered for route selection. Related work: This work provides a solution to identify malicious nodes in wireless sensor networks through detection of malicious message transmissions in a network. An ad hoc network is an indivisible part of the networks. It is an infrastructure less network. Whenever node wants to communicate with neighbourhood node, it does not depend on the centralized infrastructure. Ad hoc is dynamic in nature.Wireless ad-hoc network is a decentralize type of wireless network. The network is ad hoc because it does not rely on a pre-existing infrastructure or it does not have any access point.An ad hoc network typically refers to any set of networks where all devices have equal status on a network and are free to communicate with any other ad hoc network devices in link range. In recent years, ad hoc network refers to a mode of operation of IEEE 802.11 wireless networks. Ad hoc network using knn algorithm: Further many Ad hoc wireless nodes are powered by batteries with limited life time. Powerconstraints impact the signal processing and transmission of all nodes as well as network life time. In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression.In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms. Both for classification and regression, it can be useful to weight the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor. The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 (solid line circle) it is assigned to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 15 Issue 1 –MAY 2015 39 (dashed line circle) it is assigned to the first class (3 squares vs. 2 triangles inside the outer circle). The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples. In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point. A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance). In the context of gene expression microarray data, for example, k-NN has also been employed with correlation coefficients such as Pearson and Spearman.Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis. A drawback of the basic "majority voting" classification occurs when the class distribution is skewed. That is, examples of a more frequent class tend to dominate the prediction of the new example, because they tend to be common among the k nearest neighbors due to their large number.One way to overcome this problem is to weight the classification, taking into account the distance from the test point to each of its k nearest neighbors. The class (or value, in regression problems) of each of the k nearest points is multiplied by a weight proportional to the inverse of the distance from that point to the test point. Another way to overcome skew is by abstraction in data representation. For example in a selforganizing map (SOM), each node is a representative (a center) of a cluster of similar points, regardless of their density in the original training data. K-NN can then be applied to the SOM. Parameter selection The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification,but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques (see hyperparameter optimization). The special case where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is called the nearest neighbor algorithm. The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance. Much research effort has been put into selecting or scaling features to improve classification. A particularly popularapproach is the use of evolutionary algorithms to optimize feature scaling. Another popular approach is to scale features by the mutual information of the training data with the training classes. In binary (two class) classification problems, it is helpful to choose k to be an odd number as this avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via bootstrap method. Feature extraction When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction is performed on raw data prior to applying k-NN algorithm on the transformed data in feature space. An example of a typical computer vision computation pipeline for face recognition using kNN including feature extraction and dimension reduction pre-processing steps (usually implemented with OpenCV): 1. Haar face detection 2. Mean-shift tracking analysis 3. PCA or Fisher LDA projection into feature space, followed by k-NN classification Data reduction Data reduction is one of the most important problems for work with huge data sets. Usually, only some of the data points are needed for accurate classification. Those data are called the prototypes and can be found as follows: 1. Select the class-outliers, that is, training data that are classified incorrectly by kNN (for a given k) 2. Separate the rest of the data into two sets: (i) the prototypes that are used for the classification decisions and (ii) the absorbed points that can be correctly classified by k-NN using prototypes. The absorbed points can then be removed from the training set. 3. CNN for data reduction 4. Condensed nearest neighbor (CNN, the Hart algorithm) is an algorithm designed to reduce the data set for k-NN classification. It selects the set of prototypes U from the training data, such that 1NN with U can classify the examples International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 15 Issue 1 –MAY 2015 40 almost as accurately as 1NN does with the whole data set. Calculation of the border ratio. Three types of points: prototypes, class-outliers, and absorbed points. Given a training set X, CNN works iteratively: 1. Scan all elements of X, looking for an element x whose nearest prototype from U has a different label than x. 2. Remove x from X and add it to U 3. Repeat the scan until no more prototypes are added to U. Use U instead of X for classification. The examples that are not prototypes are called "absorbed" points. It is efficient to scan the training examples in order of decreasing border ratio. The border ratio of a training example x is defined as a(x) = ||x'-y|| / ||x-y|| where ||x-y|| is the distance to the closest example y having a different color than x, and ||x'-y|| is the distance from y to its closest example x' with the same label as x. The border ratio is in the interval [0,1] because ||x'y|| never exceeds ||x-y||. This ordering gives preference to the borders of the classes for inclusion in the set of prototypesU. A point of a different label than x is called external to x. The calculation of the border ratio is illustrated by the figure on the right. The data points are labeled by colors: the initial point is x and its label is red. External points are blue and green. The closest to x external point is y. The closest to y red point is x' . The border ratio a(x)=||x'-y||/||x-y|| is the attribute of the initial point x. Below is an illustration of CNN in a series of figures. There are three classes (red, green and blue). Fig. 1: initially there are 60 points in each class. Fig. 2 shows the 1NN classification map: each pixel is classified by 1NN using all the data. Fig. 3 shows the 5NN classification map. White areas correspond to the unclassified regions, where 5NN voting is tied (for example, if there are two green, two red and one blue points among 5 nearest neighbors). Fig. 4 shows the reduced data set. The crosses are the class-outliers selected by the (3,2)NN rule (all the three nearest neighbors of these instances belong to other classes); the squares are the prototypes, and the empty circles are the absorbed points. The left bottom corner shows the numbers of the class-outliers, prototypes and absorbed points for all three classes. The number of prototypes varies from 15% to 20% for different classes in this example. Fig. 5 shows that the 1NN classification map with the prototypes is very similar to that with the initial data set. The figures were produced using the Mirkes applet. CNN model reduction for k-NN classifiers Fig. 1.The dataset. Fig. 2.The 1NN classification map. Fig. 3.The 5NN classification map. Fig. 4. The CNN reduced dataset. Fig. 5. The 1NN classification map based on the CNN extracted prototypes. International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE) ISSN: 0976-1353 Volume 15 Issue 1 –MAY 2015 41 Starting with a star tree (A), the Q matrix is calculated and used to choose a pair of nodes for joining, in this case f and g. These are joined to a newly created node, u, as shown in (B). The part of the tree shown as solid lines is now fixed and will not be changed in subsequent joining steps. The distances from node u to the nodes a-e are computed from equation (3). This process is then repeated, using a matrix of just the distances between the nodes, a,b,c,d,e, and u, and a Q matrix derived from it. In this case u and e are joined to the newly created v, as shown in (C). Two more iterations lead first to (D), and then to (E), at which point the algorithm is done, as the tree is fully resolved. Neighbor joining takes as input a distance matrix specifying the distance between each pair of taxa. The algorithm starts with a completely unresolved tree, whose topology corresponds to that of a star network, and iterates over the following steps until the tree is completely resolved and all branch lengths are known: 1. Based on the current distance matrix calculate the matrix (defined below). 2. Find the pair of distinct taxa i and j (i.e. with ) for which has its lowest value. These taxa are joined to a newly created node, which is connected to the central node. In the figure at right, f and g are joined to the new node u. 3. Calculate the distance from each of the taxa in the pair to this new node. 4. Calculate the distance from each of the taxa outside of this pair to the new node. 5. Start the algorithm again, replacing the pair of joined neighbors with the new node and using the distances calculated in the previous step. The Q-matrix Based on a distance matrix relating the taxa, calculate as follows: ( 1 ) where is the distance between taxa and . Distance from the pair members to the new node For each of the taxa in the pair being joined, use the following formula to calculate the distance to the new node:
منابع مشابه
BeeID: intrusion detection in AODV-based MANETs using artificial Bee colony and negative selection algorithms
Mobile ad hoc networks (MANETs) are multi-hop wireless networks of mobile nodes constructed dynamically without the use of any fixed network infrastructure. Due to inherent characteristics of these networks, malicious nodes can easily disrupt the routing process. A traditional approach to detect such malicious network activities is to build a profile of the normal network traffic, and then iden...
متن کاملاستفاده از خوشه بندی در پروتکل مسیریابی AODV برای شبکه های بین خودرویی بر روی سناریوی بزرگراه
Vehicular Ad hoc networks are a subset of mobile Ad hoc networks in which vehicles are considered as network nodes. Their major difference is rapid mobility of nodes which causes the quick change of topology in this network. Quick changes in the topology of the network are considered as a big challenge For routing in these networks, routing protocols must be robust and reliable. AODV Routing pr...
متن کاملGNB-AODV: Guard Node Based –AODV to Mitigate Black Hole Attack in MANET
In multi-hop wireless ad hoc network, the packets are transmitted through intermediate nodes to reach the destination. In this topological structure, there is no centralized co-ordinating, monitoring or control point. Due to this type of network environment, an intermediate node can act as either selfish or malicious to drop packets. The primary objective of such an untrustworthy behavior of th...
متن کاملSecure Routing Protocol: Affection on MANETs Performance
In mobile ad hoc networks, the absence ofinfrastructure and the consequent absence of authorizationfacilities impede the usual practice of establishing a practicalcriterion to distinguishing nodes as trusted and distrusted.Since all nodes in the MANETs would be used as router inmulti-hop applications, secure routing protocols have vital rulein the security of the network. So evaluating the perf...
متن کاملبهبود پروتکل AODV جهت مقابله با حملات کرمچاله در شبکههای اقتضایی
Mobile Ad hoc Networks (MANET) are vulnerable to both active and passive attacks. The wormhole attack is one of the most severe security attacks in wireless ad hoc networks, an attack that can be mounted on a wide range of wireless network protocols without compromising any cryptographic quantity or network node. In Wormhole attacks, one malicious node tunnels packets from its location to the ...
متن کاملSecuring AODV routing protocol against the black hole attack using Firefly algorithm
Mobile ad hoc networks are networks composed of wireless devices to create a network with the ability for self-organization. These networks are designed as a new generation of computer networks to satisfy some specific requirements and with features different from wired networks. These networks have no fixed communication infrastructure and for communication with other nodes the intermediate no...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Communications
دوره 31 شماره
صفحات -
تاریخ انتشار 2008